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1. INTRODUCTION* 


In practical applications of pattern recognition, such as remote sensing, one 
of the difficult problems is obtaining the labels for training patterns. 
Labeling training patterns is costly, and very often the labels are imperfect. 

In the recent literature, several authors investigated the problem of pattern 
recognition with imperfectly labeled patterns. Duda and Singleton [1] showed 
that, for orthogonal pattern vectors, the average weight vector of a threshold 
logical unit converges to a solution weight vector for the correctly labeled 
pattern set. Whitney and Dwyer [2] obtained error bounds in a two-class 
situation on the performance of a nearest neighbor rule with an imperfect 
teacher. Kashyap and Blaydon [3] proposed an iterative training procedure 
for a two-class case. Gimlin and Ferrell [4] studied the correction of 
labels using a nearest neighbor procedure. Shanmugam and Breipohl [5] 
proposed an error-correcting procedure for disjoint densities using Parzen 
estimators. Chittineni [6,7,8] investigated the applicability of probabi- 
listic distance measures for feature selection with imperfectly labeled 
patterns. 

This paper considers the problem of learning with imperfectly labeled patterns. 
In section 2, the author develops a model for the imperfectly labeled patterns. 
Section 3 presents an analysis of the Bayes classifier error with and without 
imperfections in the labels. In section 4, we obtain bounds on the perform- 
ance of nearest neighbor classifiers for a multiclass case. The training of 
a classifier with and without imperfections is discussed in section 5, and 
schemes for the correction of imperfections in the labels are developed in 
section 6. Section 7 presents expressions for success probability as a 
function of time for the one-dimensional classifier, and section 8 treats 
feature selection criteria with imperfect labels. 

*Th1s document^ was prepared originally in January 1979 for submission to the 
Institute of Electrical and Electronics Engineers (IEEE) Journal. Thus, 
its format conforms to the IEEE requirements and is not consistent with 
standard Lockheed Electronics Company, Inc., document specifications. 
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2. A MODEL FOR LABELING IMPERFECTIONS 


Let u and w be the perfect and imperfect labels, respectively, each of which 
takes values of 1, 2, ••*, M; where M is the number of classes. Let p(oi = i) 
and p(X|w = i) be the a priori probabilities and class conditional densities 
of the patterns in classes w = i. 

Assuming that the imperfections in the labels are described by the 
probabilities 

Gjj = P(w - i|w = j) ; i,j 8 (2-1) 

where i and j indicate class, we have the constraint 

M 

£ 8 ,, * 1 ( 2 - 2 ) 

i=l J 

Assume further that 

p(X|w., Wj ) = p(XftOj) (2-3) 

In order to find the relationship between p(X|w = i) and p(X|m = i), consider 

p(X Jto = i) = — ^ — — E p(X,oi = i = j) 

P(u = i) j=l 


- — £ p(Xjuj = i,u = j)P(dS - i |oi = j)P(u> = j) 

P(w - i) j=l 

i M 

= —z 1 E 3.,P(w = j ) p ( X | to = j) (2-4) 

P(£ = i) j=l J1 

Cross-multiplying and dividing equation (2-4) by p(X) establishes the rela- 
tionship between a posteriori probabilities 

M 

p(w - i|X) = E 0,,p(ai = j|X) (2-5) 

j=l J1 


2 


Similarly, the a priori probabilities are related as 


M 

P(w = i) = £ 3 . ,P(u> = j) 
j=l 


( 2 - 6 ) 


Inverting equation (2-4) yields the following result for a two-class case. 

P(w ■ l)p(X|o) = 1) * 1 q n [0 ?? P(w = 1 )p(X|oj = 1) 

nV22 3 12 3 21 u 


- 3 2] P(w = 2)p(X|to = 2)] 


P(a» = 2) P (X|d> = 2) [b P(t5 = 2)p(X|u = 2) 

P ll 3 22 3 12 3 21 11 


(2-7) 


- 3 12 P(w = 1 ) P( X | to = 1)] 

Similarly, for the a priori and a posteriori probabilities, 

P(u i) ~ q § _ o s = i) - 3^,-P(w = j ) J i 1»j - 1 ,2 

p lr22 p 12 p 21 J1 tut 


i t j 


(2-8) 


P(“ = 1 I X ) = q—q 7TT -p{to = 1 1 X) 

3 11 3 12 3 12 3 21 JJ 

- 3 1{ p(u - j|X)] ; i » j = 1 ,2 

J1 i t J 


For a symmetric case, when 

P 11 ~ p 22 ~ 3 3 12 = P 21 s 1 “ 3 


then 


(2-9) 


( 2 - 10 ) 


(2-11) 


3 11 3 22 " 3 I2 3 21 = ^ 2e ' ^ 

From equations (2-7), (2-10), and (2-11), 

[P(u = l)p(*|u » 1) - P(u = 2)p(X|u - 2)] = [p(fi = I)p(x|fi » 1 ) 

- P(5 = 2)p(X|fi = 2)] (2-12) 


3 



The densities for symmetric and nonsymmetric labeling errors are Illustrated 
in figures 2-1 and 2-2. 



Figure 2-1.- Illustration of densities for symmetric labeling errors. 



Figure 2-2.- Illustration of densities for nonsymmetric labeling errors. 
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3. PERFORMANCE OF BAYES CLASSIFIER WITH AND WITHOUT MISLABELING 

In this Section, we analyze the performance of the Bayes classifier with and 
without imperfections in the labels. 

3,1 TWO-CLASS BAYES CLASSIFIER PERFORMANCE WITH SYMMETRIC MISLABELING 

For a symmetric model, we develop a unique relationship between the probabil- 
ity of errors of the two-class Bayes classifier with and without mislabeling 
errors. The Bayes decision rule is 

Decide X C to = 1 if P(to = l)p(X|w = 1) > P(w = 2)p(X|w = 2) 

(3-1) 

Decide X € to = 2 otherwise 
The Bayes probability of error {P ) is 

p e min[P(to = l)p(X|w = l),P(w = 2)p(X|u = 2)]dX (3-2) 

For any two positive real numbers A and B, 

inin(A,B) = \{h + B) - 1|A - B| (3-3) 

Using equations (3-2) and (3-3) yields 

p e = \ \ /| P(U = 1 )p(X |to = 1) - P(to * 2)p(X|w = 2) dX (3-4) 

Following the argument presented by equations (3-1) through (3-4), the prob- 

A 

ability of error of a two-class Bayes classifier with imperfect labels (P g ) 
can be written as 

P e = \ - \J |p(u = 1 )p(X|to = 1) - P(w = 2)p(X|to = 2) dX (3-5) 
From equations (2-12), (3-4), and (3-5), we obtain the following. 
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P e ■ 1 - \\2S - 1| J |p(u = l)p(X|u ■ 1) - P(u) = 2)p(X|u = 2) dX 
= i - 1|2S - 1 1{1 - 2P e ) 

- 128 - 1|) + 1 2S - l|P e (3-«) 

From equation (3-6), writing P in terms of P , 

c C 

'• ■ * H' ' TSTTf) ■ 

Equation (3-7) is graphically displayed for various values of 3 in figure 3-1. 



Figure 3-1.— Bayes risk for a symmetric model with and without labeling errors. 
Figure 3-1 shows the increase in P because of labeling errors. When P = 0.5, 

A ° ^ 

the decision is random; hence, P is independent of 3- 

0 
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3.2 MULTICLASS BAYES ERROR UNDER A GENERAL MISLABELING MODEL 
The Bayes risk (r) in classifying a pattern X, can foe written as 


r( X) = 1 - max[p(w = i j X) 3 (3-8) 

r(X) = 1 - max[p(w = i|X)] (3-9) 

i 

Then the probability of error can be written as 

P e ■ E p(x) [r( x )3 (3-10) 

P e = Ep( X )Cf(X)] (3-11) 


where E is the expectation operator. If the imperfections are not symmetric, 
then the Bayes errors depend on the particular probability density functions 
of the patterns. However, ue obtain bounds in the following manner. 


3.2.1 LOWER BOUND 
Let 


’RSM 


w = i 1 X)] 

(3-12) 

/ pi i) 

(3-13) 

VFl J 7 


where RSM = row sum maximum. 


From equations (3-8), (3-9), (3-12), and (3^13), we obtain 

A f M 1 

r(X) = 1 - max ]C3 h p(w = jjX) 
i [j=l J 



= ^ " 3 RSM^ + S RSM r W 


(3-14) 
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Taking the expectations on both sides of equation (3-14) with respect to p( X) 
obtains the desired inequality 


P e - ^ " e RSM^ + 3 RSM P e 


(3-15) 


3.2.2 UPPER BOUND 
Let 


p(w = k | X) = max[p(w = j | X) J 
j 

b i = mi n ( 3^ ^ ) 

a ~ min max(B. * - b. ) 
k i k1 1 


(3-16) 

(3-17) 

(3-18) 


where k is a scalar indicating class. 


Consider 


max[p(w = i J X) ] = max 
i i 


M 

3 ki P(w - k|X) + = J |x) 

K1 j=1 

J^k 


> max[(3 k ^ - b^)p(w = k|X) + b^] 
£ a[1 - r(X)3 


However, 


r(X) = 1 - max[p(u> = i|X)] < (1 - a) + ar(X) 
i 


(3-19) 


(3-20) 


Taking the expectations on both sides of equation (3-20) with respect to p(X), 
we obtain 


P g < (1 - a) + aP 0 


(3-21) 


3.3 ILLUSTRATION OF BOUNDS 

To illustrate the upper and lower bounds, let M = 3. Consider a matrix of 
mislabeling probabilities 3 .. as shown in figure 3-2. The various quantities 

J * 
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required to compute the bounds are also shown. 


M 



b.j = fnin(3j^) 0.05 0.05 0.05 

3 rsm = 1 , a = 0.85 


Figure 3-2.- Example illustrating upper and lower bounds. 


A 

Let P = 0.2 and using the above probabilistic mislabeling model, the bounds 
on the true probability of error P Q without imperfections in the labels are 
given by 

0.059 < P fi < 0.2 ( 3 " 22 > 


4. PERFORMANCE OF MULTI CATEGORY NEAREST NEIGHBOR CLASSIFIER 
UNDER A GENERAL MODEL OF MISLABELING 


In the case of imperfectly labeled patterns, given the pattern X, the condi- 
tional nearest neighbor risk can be written as 


M M 

r„(X) = p(to=l|X)£p(toi = j|X) + ••• + p(w = M|x)£p<£ = j (X) 

N d=i j=1 

j7i m 

M 

- 1 - £p(u = i |X)p(w - i | X) 
i=l 

MM , v 

= 1 - Z = i { X) p(aj = j|X) ( 4 - 1 ) 

i=l j-1 ' n 
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In the following subsections, we obtain bounds on the nearest neighbor error 
in terms of Bayes classifier error. 


4.1 LOWER BOUND 

Substituting equation (3-12) into (4-1) obtains the following result. 

* M M 

r N (X) > 1 - K£ Ds.,P(o) = j|X) 
i=l j a l Jl 

M M 

* 1 - KEp(w = j|X)ZB.. 
j=l 1=1 J1 

= 1 - K 


= r(X) 

(4-2) 

Taking expectations with respect to p(X), on both sides 
results in 

of equation (4-2), 

^eN " P e 

(4-3) 

where P ef j is the nearest neighbor error. 


4.2 UPPER BOUND 


Let 


p (to = k|X) = max[p(w = i|X)] 
i 

(4-4) 

6 = min(3..0 
i 

(4-5) 


(4-6) 


Consider the following. 
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E E 2 ii p(w = 1 |x)p(w » J | X) = E3 ii P 2 {w = i J x) 

i=l j=1 J1 1*1 

M M 

+ £p(w = 1|X)E3 i iP(w = j|X) 
i=l j=l J1 

J71 

M 

= 3 kk P 2 (w = k|X) + E3 iiP 2 U = i|X) 

i7k 
M 

+ p(u = k | X) E3 i t,P(w = .1 1 X) 
j=l JK 

m 

M M 

+ Ep(w * i|X)E3 1i p(w = j|X) 
i=l j=l 

i7k jjM 

However, 

M 5 i 

E P (w = i | X) > [y| 1 

i = 1 
1/k 

= trrrV - p(w = k|x)]2 

Now consider 

M 

p(to = k|X)E3j k p(w a j | X) > 3 ni P(w = k | X) [1 - p(o) = k | X)] 
J7k 

Substituting equations (4-8) and (4-9) into (4-7) results in 

E E3.-,-p(w * i I X) p(tu = j jx) > 8[1 - r(X)] 2 + crTT 
i=l j=l 

+ 3 n) r(X)D - r(X) ] 


M 

Ep(w 

i=l 

L^k 


= i | X) 


(4-7) 


(4-8) 

(4-9) 

■ 2 (X) 

(4-10) 
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From equations (4-1) and (4-10), we obtain 

. ? N (X) < 1 - S[1 - r(X)] 2 - r!V<X) ■ - KM 

= 0 - 0) + (26 - B m )r(X) - (ffTlS - e m )r 2 (X> (4-11) 

But we have 

E p(x) [r 2 (X)] - Var[r(X)] + P 2 

2 P l (4-12) 

mt-i s * >- 0 < 4 - 13 > 

Taking the expectations on both sides of equation (4-11) with respect to p(X) 
and using equations (4-12) and (4-13) results in 

P eN « 11 ' B) + (2B ’ B m )P e ' (fT^f B - B ra ) P e (4 ' ,4) 


When 3 = 1 and 3 m = 0, we have the perfectly labeled situation, in which case 
equation (4-14) becomes 




(4-15) 


It is seen that equation (4-15) is identical to the nearest neighbor bound 
obtained by Cover and Hart [9]. 


5. DESIGN OF PATTERN CLASSIFIERS WITH IMPERFECTLY LABELED PATTERNS 

In this section, we consider the problem of designing a classifier with imper- 
fectly labeled training patterns. Once the amount of imperfections is known, 
this knowledge can be incorporated into the classifier training and results 
in improved performance. 
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5.1 INCREMENT ERROR CORRECTION CLASSIFIER (NONPARAMETRIC TRAINING) 


Consider the case of two pattern classes. Assume a given set of training 
patterns X-j (1 ) , X-, (2) , •••, X^Nj) and X.^l), X_.,(2), (N_ 1 ) from 

classes 1 and -1, with Imperfect labels tlij(l), •••, wj(Ni); w_-j(l). 

Let the perfect labels of these patterns be aijO), •••, 
i ( 1 ) , co_-j (N_i ) . The imperfect and perfect labels take values of 
1 and -1. The objective is to find a decision function d(X), such that 

d( X) >0 when X C w, \ 

1 (5-1) 

d(X) <0 when X G u>_j } 


5.1.7 ALTERNATIVE REPRESENTATION OF IMPERFECTIONS 


The imperfect labels can be modeled from perfect labels as follows. 

u> = wn (5-2) 

where n = labeling noise and takes values of 1 and -1. Since w takes 1 and 
-1, whenever w differs from n we have an imperfect label. Recalling our pre- 
vious model of imperfections, we have 

3,-j = P(w - j|w = i) ; i,j = ±1 (5-3) 


Let 


n = E(n) 


(5-4) 


and 

P. = P(to = i) ; i = ±1 (5-5) 

Since n takes values of 1 or -1, the average value of n. n\ can be written as 
follows. 
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n = E(n) 

= P{n = 1) - P(n = -l) 

= P(n = 1 >w = 1) + P(n = l,w = -l) - P(n = -1 ,w - 1) 

- P(n = -l,o) = -1) 

= p^n = l Jo) = l) + p^PCn = l|oi = -l) - P^fn = -i| u = 1) 

- P_ 1 P(n = -l|o) = -l) 

= P^foi = 1 | 0 ) = 1) + P_-,P(w = -l|o) = -1) - P^tO = -l|o) = 1) 

- P_.,P(m = 1 |o) = -l) 

= P 1 3 n + p _ 1 3_ 1 - P^^,-, - P^P.u 


= (23 n - 1)P 1 + (23_ 1> „ 1 - 1 ) P_ 1 

(5-6) 

Under a symmetric model, the above becomes 

n = 23 - 1 

(5-7) 

5.1.2 AN ERROR CORRECTION ALGORITHM WITH IMPERFECT LABELS 
To obtain a linear approximation to d(X), 


d( X) = X T W 

(5-8) 

where W is the weight vector. Let d Q (X) be the unknown decision function and 
d*(X) be the optimal decision function. Suppose we set up a criterion 

C(W) = E[a(W,X)J (5-9) 

where 

a(W,X) = W T x| Sgn(W T X) - Sgn[d 0 (X)]j 

(5-10) 

Let W* be the value of W which minimizes C(W), then 

C(W) > C(W*) 

(5-11) 
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(5-12) 


The corresponding optimal approximation d*(X) to d«(X) Is given by 

d*(X) = XV 

At the Uh step of training, the weight vector WU) is updated using the 
steepest descent method. 

WU + 1) = WU) - vU) 8 -^ (5-13) 

W = WU) 

Since the gradient is not known, the above is approximated to 

W<A + 1) = WU) - vU)g(W,X)| (5-14) 

X = XU),W*WU) 

where XU) is the training pattern at the Ath step and 

g(w,x) « |Sgn(W T X) - Sgn[d Q (X)](X (5-15) 

Since the perfect label Sgn[dQ(X)] is unknown, we replace g(W,X) with 
f[W,X,w(X)] so that f is observable for any X and has the same expected 
value as g, where 

f[W,X,u>(X)] = i[rfSgn(W T X) - w(X)]X (5-16) 

n 

E{f[W,X,w(X)] |W} = E[|4r[TiSgn(W T X) - w(X)]X + jSgn[d 0 (X)3 - Sgn[d 0 (X)]|x) |w] 

= e({sgn(W T X) - Sgn[d 0 (X)]}x|w) 

+ E({sgntd 0 (X)] - ^rSgn[d 0 (X)]}|w) 

= E({sgn(W T X) - Sgn[d 0 (X)]}x|w) 

=E[g(W,X)|W] (5-17) 

Hence, the error correction algorithm for updating the weight vector WU) at 
the £th step of training can be written as 

WU + 1) = WU) - vU) i{nSgn[X T WU)3 - w(X)(x (5-18) 
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where X Is the training pattern at the £th step of training and rf Is given 
by equation (5-6). For the convergence of this algorithm, the conditions 
on v{£) can be shown to be [3]: 

v(£) > 0, £ v(£) = «, ]£ v^(£) < » (5-19) 

£=1 £*1 

5.2 BAYES CLASSIFIER (PARAMETRIC TRAINING) 

Once the a priori probabilities and class conditional densities of the 
Imperfectly labeled patterns and the mislabeling probabilities P-jj's are 
estimated, equations (2-7) and (2-8) can be used to obtain the a priori 
probabilities and class conditional densities of the perfectly labeled 
patterns. Then, the following algorithm can be used to classify the patterns. 

Decide KCid = i if P(ui = i)p(X|u = 1) > max[P(to = A) p(X | tn = £)] (5-20) 

£ 

£=1,2,»*«,M 

m 


5.3 THE CASE WHEN B^'S ARE UNKNOWN 

In the case when P.^'s are unknown, the scheme shown In figure 5-1 can be 
used to design the classifier. 
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Figure 5-1.— Flow diagram for learning with imperfectly labeled 
patterns when are unknown. 








6. CORRECTION OF IMPERFECT LABELS OF TRAINING PATTERNS 


This section deals with methods of correcting imperfect labels of training 
patterns. Whenever a method is confident enough to show that the label is 
imperfect, we correct the label of the training pattern. 

6.1 LABEL CORRECTION USING k-NEAREST NEIGHBOR DECISION RULE 

The nearest neighbor decision rule can be used to correct imperfect labels in 
training patterns [4]. Suppose that n - 1 training patterns are processed 
with imperfect labels. When a pattern X n with imperfect label u(X n ) is 
presented to the algorithm, a guess of the true label of X n must be made by 
combining the information in w(X n ) and the Information in the previously 
processed n - 1 training patterns. The new label of X n , w(X n ), is determined 
in the following manner. 

Assume that two positive integers k and k' are given, where k is an odd 
integer and k' is an integer such that k' > (k + l)/2. Using a distance 
metric d, the k-nearest neighbors to X n among the training patterns X-| , 

^2’ ***» ^n-1 are ^ oca ^ ec *' If at l fia st k' of the nearest neighbors to X n 
have the same value for their class labels, w(X n ) is set to that value. 
Otherwise, w(X R ) is set to the value of w(X n ), the label provided by the 
teacher. The process is repeated to obtain the label of pattern X n+ ^ . 

The integer k' specifies the degree of confidence required in labeling the 
k-nearest neighbors of X n before the label of the teacher of X n is changed. 

At least k of the training patterns must be obtained before the beginning of 
the label correction process. Also, at least k - k' + 1 of the teacher's 
labels are accepted for each class before the label correction process is 
begun, in order to avoid the algorithm's labeling of all patterns into one 
class. 

At the termination of the label correction process, unlabeled patterns can 
be classified using the k-nearesc neighbor decision algorithm. 
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6.2 LABEL CORRECTION USING BAYES DECISION RULE 


In this section, an algorithm is developed to correct the Imperfections In 
the labels of the training patterns. Assuming two pattern classes and that 
the densities are Gaussian; i.e., 

p(X|to * i) - N(M| *E i ) ; 1=1,2 (6-1) 

where is the mean and Is the covariance matrix of the patterns in the 
classes <^, i =1,2. The Bayes decision rule uses the following criterion. 

Decide 

and Decide 

where 

d(X) = 

The following scheme for correcting imperfections in the labels is proposed. 
Change the label of X to to = 1 if d(X) > t^ 

Change the label of X to to = 2 If d(X) < -t 2 

Do not change the label of X if -t 2 < d(X) < t 

where t-j and t 2 are the thresholds. 

6.2.1 DISTRIBUTION OF d(X) 

Assume that [d(X)|X G to = i], i = 1,2, is a Gaussian random variable with 
mean m^ and variance o\j, i = 1,2; i.e., p[d(X)|XC to = i] = Nfm^a,.). Then, 



X C to = 1 If d(X) > 0 
X C to = 2 if d(X) < 0 


( 6 - 2 ) 


log 


P(to = 1 ) p(X 1 a> = 1) 
P(to = 2)p(X|to = 2) 


(6-3) 
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the expressions for and 0 ^ can be shown to be [10]: 



6.2.2 SELECTION OF THRESHOLDS t } AND t g 

We propose to select the thresholds t-| and t 2 for correcting the imperfect 
labels by specifying the probability a that mislabeling will occur in the 
correction process. 

a - P(bad label ) 

= P(a) = 1 ) P( bad label | X Gw = 1) + P{oi = 2)P(bad label |XC w * 2) 

= P(to = l)P)d(X) < -t 2 |X G a) = 1) + P(w = 2)P[d(X)] > t-j |X G u = 2( 

( 6 - 6 ) 
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Following the argument and assumptions similar to those of section 2, it can 
be shown that 


p <“ 1 nptdWIu = u - Bll p 2a 1- S)2 e 21 {g 22 P ( “ - 1)p[d(X)|2 * 1] 

- 3 21 P(w = 2)p[d(X)|w = 2)]| (6-7) 

p (“ " 2)PCd(X)l“ ■ 23 - B n S 22 - B 12 B 21 = 2)PC<*(X)|£ = 2] 

- » 12 P(S ■ 1)p[d(X)|S • 1]) (6-8) 


Then, 


rh 

P(w = l)pCd(X) < -tJXC u) = 1] = P(u = 1) I p[d(X) |w = 1 ]d[d(X)] 

C •'-00 


^11^22 " e 12 P 21 ' 22 


r 

■'-00 


3ooPU = 1) p[d(X) (w = l]d[d(X)] 


- 3 21 P(w = 2) 



p[d(X) |w = 2]d[d(X)] 


3 11 3 22 " 3 12 3 21 


3 22 p (“ = 1) 


rV m i 

°' 

V— 00 


N(o,i)de 


r" t 2' m 2 


- 3 21 P(w = 2) 


N(0,l)dg 


(6-9) 


where N(0,1)^ is a Gaussian density function with zero mean and unit variance. 
After an argument similar to equation (6-9), from equation (6-8) we obtain 
the following. 
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P(w ■ 2)P(bad label |XCw 2 ) = P(u = 2)J p[d(X) |X C u 2 ]d[d(X)] 

ti 


3 11 3 22 " 3 12 3 21 


, V m 2 


e n p(w = 2) 


N(0,l)d? 


vaqo 


- 3 12 P(u = 1) 



N(0,1)dS 


( 6 - 10 ) 


For a specified a, using equations (6-9) and (6-10) in (6-6), tj and t 2 can 
be determined by one-dimensional numerical Integration, and imperfect labels 
can be corrected using the algorithm in equation (6-4). 


7. ONE-DIMENSIONAL CLASSIFIER WITH IMPERFECTLY LABELED PATTERNS 

Consider the training of a one-dimensional version of the Increment error 
correction classifier considered in section 5.1. Let <u and w be the perfect 
and imperfect labels of training patterns that take values 1 or -1 . Assuming 
a symmetric model for the Imperfections, 1.e.» 

P ( oj = 1 |w = 1 ) = 3 = P(w = -1 |w = -1 ) ) 

(7-1 ) 

then P(w = -1 |w = 1 ) = 1 - 3 = P(w = 1 1 « = - 1 ) ) 

As in section 5.1, the Imperfections in the labels can be considered in 
terms of a quantity n, which takes values 1 or «!. 

(7-2) 
(7-3) 

(7-4) 


to =' eun 

In section 5.1, n = E(n) is related to 3 as 

if = 23 - 1 

The decision rule for one-dimensional patterns is 

Decide x C w * 1 , if x n > k fl 

Decide x C w = -1, if x n c k n 
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where k n is the threshold at the nth training step. If u>(x n ) 1$ the label 
of x R> the pattern at the nth training step, the usual training procedure 
is to adjust k n according to the following relation. 

k n+l ■ k„ - M*„) - Sgn(x„ - k„)]4k n (7-5) 

where Ak n is the Increment of adjustment. Since the label m(x n ) is not 
known, we modify equation (7-5) according to (7-6). 

k n+1 = k n ' J t“( x n> " " s 9 n < x n ' k n )]ak n (7 ’ 6) 

Similar to the results shown in section 5.1, the conditions for convergence 
of equation (7-6) can be shown to be 

v > 0, £ v = °°, £ vj: c “ (7-7) 

" n=l n n=l n 

This section contains the learning dynamics of this procedure; i.e., expres- 
sions for the learning curves. Let fj(x) and f_}(x) be the constituent 
densities of patterns in classes w = 1 and u c -I, respectively. Similarly, 
let f^x) and f_-j(x) be the constituent densities of the patterns in classes 
u = 1 and w = -1, respectively. Let 

Pj =J f-j (x)dx and = J f_-|(x)dx^ 

> (7-8) 

-00 fOQ I 

Pi = J f^(x)dx and P_j =J f_-j(x)dx j 
Then, the following probabilities can be easily obtained. 
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Pr[w = 

1 ,Sgn(x - 

k) * 

1] = 

■ 

r? 

'k 

^(x)dx 




= 

A 

p i - 

£ V x)d * 




= 

/V 

p i ' 

F,(k) 

Pr[w 

= 1 ,Sgn(x - 

• k) - 

• -1] 

■J* 

•'-00 

f^ (x)dx 





A 

= F 1 

(k) 

Pr[w 

a -1 ,Sgn(x 

- k) 

» 1] 

-J 

^00 

f.i(x)dx 
k 1 





= p_ 

i - KiM 

Pr[w 

= -1 ,Sgn(x 

- k) 

= -1] 

'•J 

f f_^ (x)dx 

Leo 


A 

= F_ 1 ((c) 



Let P n (k) be the probability of occurrence of k at time instant n. Then the 
training algorithm., equation (7-6), may be described by the following 
difference equation. 


T v n (l - n) 1 A 

p n+ i ( k ) = P n [k + -* 1 — =r AkJPrO = l,Sgn(x - k) = 1] 

T v n (l + n) "I 

+ P n [k + Ak n jPr[a> = 1 ,Sgn(x - k) = -1] 

T v n (l + n) 1 

+ P n ^k = Ak n J p r[w = -l,Sgn(x - k) = 1] 

T v n (l - n) 1 

+ P n [k - AkJPrCu = -l,Sgn(x - k) = -1] (7-10) 
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The substitution of equation (7-9) into (7-10) obtains 


WO 1 p 


n[ k+!a ^ A JI ? 1 mt i k * 


v»(1 - n) 


Ak n]i 


f V 1 + n) 1J V (l + n) 1 

+ P k + Ak P, k + -~= Ak 

n L n ".PL n n J 


T V 1+ ^ IS- - f y n ( 1 + n) I/ 

P n[ k - - JL ~W ik „J) P -l ’ F -lL k - V Ak "J 1 


+ P n[ k - !S V^ 4k J P -'[ k ' 


\>„(i - n) 


Ak n] 


(7-11) 


Now a differential equation describing the learning process can be obtained. 
Using a continuous approximation and rearranging equation (7-11), after 
subtracting both sides of it, P(nAt,k), we obtain 

\>nO - n) 


P(n + 1 At.k) - P(nAt,k) .. 
At 


v Ak 

23-1 n n 


P 

a — »- 


k + - — = Ak n ,nAt 


P(k,nAt) 


P - v Ak 
23-1 n n 


. Wi: 


k + 


v n (l - n) 


n Ak n-" At . 


2v„Ak„ 
n n 

zTT 


k - 


\) n (l + n) 


Ak n ,nAt] j 




- p[k + ■"" " 


“ n) 


■,nAtjpJk + 


W - i)H .. 2 V k n 


]l 

v n 0 - n) "1 

Ak 1 

n n J 


+ | p [ k - Ak n’ nAt ] F -l[ k • 

- p [ k - A V" At ] p -i[ k - ^ k n] ! >< S$r 


n n 
(7-12) 
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Letting At •+ 0 and Ak 0, we get the following from equation (7-12). 


3P 

at 


■ jP 3k 2(1 - g) £ 3P 3k 2 

ak at 2Tn P -1 3k V(t) at 2T^T 


9PF, a i- 3PF i a^ 

+ sr 2v <‘> i + ir 1 2u <‘> & 

Rewriting equation (7-13) yields 

f£= v(t)g(t) -ff 

where 

p = — 0 — ~ 3 ) _ p 2 _j_ «p x pc 

F 20 - 1 -1 20 - 1 ZF 1 2F -1 


= 2 


Pi - 0 

Fi + F i + 1 


1 r -l 20-1 


g(t) - || 


The conditions on v(t) and g(t) for convergence become 

j*oo 

Jo g(t)dt = » 

0 < g(t) < » 

Jq v(t)dt = °°> Jq v 2 (t)dt < v(t) > 0 

The conditional probability of success (S), given k, is 

S(k) = P[u - 1 ,Sgn(x - k) = 1] + P[u = -1 ,Sgn(x - k) = 

= fk f l^ X ^ dx + / f -lM dx 

■ p , - j£ f,(x)dx + f k f.,(x)d* 

= P, + F_, (k) - F,(k) 


- 1 ] 


(7-13) 

(7-14) 

(7-15) 

(7-16) 

(7-17) 

(7-18) 
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From equation (2-8), we have 


p i ■ wh l* ? i ' <> - 

■ (® " p -l > (7-19) 

From equation (2-12), we have 

F_,(k) - F,(k) = 2TTT [ p -l< k ) " p l< k >] 

Thus, S(k) is given by 

S(k) = 23 -rT CF^i<k> - F^k) + (3 - P.-,)] (7-20) 


The success probability (Z) at any time instant t is defined as 

Z(t) = f p(k,t)S(k)dk 

-oo 

and P satisfies the differential equation 

3t ” 3k 

The solution to this equation is given by [11]: 



(7-21) 


(7-22) 


(7-23) 


Then 

Z(t) = S[V(y - y Q )] (7-24) 

Hence, Z(t) can be plotted as a function of time to study the learning char- 
acteristics of the training algorithm. 
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8. FEATURE SELECTION CRITERIA WITH IMPERFECTLY LABELED PATTERNS 


/ 

Probabilistic distance measures are normally used In practice as a feature 
evaluation criterion for selecting best features. Of all the probabilistic 
distance measures, the Bhattacharyya distance is most frequently used, since 
it is easy to evaluate under the Gaussian assumption and its general relation- 
ship to the Bayes probability of error. In this section, we present a rela- 
tionship between the Bayes probability of error and the Bhattacharyya distance 
with imperfectly labeled patterns under the symmetric model discussed in 
section 2. 

Consider the case of two pattern classes. Let P. and p be the Bayes proba- 
bility of error and the Bhattacharyya coefficient with imperfectly labeled 
patterns, p is defined as 

P = /[p(X|S = l)p(Xjw = 2)] 1/2 dx (8-1) 

A A 

It is well known [12] that P 0 and p are related as 


~ ^ - ^1 - P(fi = l)P(w = 2)p 2 ] < P e < f/p(w ■ l)P(w = 2)$ (8-2) 

From section 2, the relationship between the Bayes probability of error with 


P 0 and without P e imperfections is 


(1 - 1 2 $ - 1 |) 


e 1 28 - li 2 1 28 - T| 

From equations (8-2) and (8-3), the desired relationship is obtained. 

] 1 - - 4P(w = l)P(w = 2)p 2 - i— — — — < P 

2120 - 1 T J 2 |23 - 1 | ' 


(8-3) 


" 1 i P (£jL. 2l 3 . 3 (1 . 

1 28 - 1| 2 1 20 - 1| 


28 - li) 


(8-4) 
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Similarly, using the relations developed in section 2, other probabilistic 
distance measures can be studied [6,7,8]. 
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